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Abstract. Maximizing robustness and minimizing cost are common objectives in the 
design of infrastructure networks. However, most infrastructure networks evolve and 
operate in a highly decentralized fashion, which may significantly impact the allocation 
of resources across the system. Here we investigate this question by focusing on the 
relation between capacity and load in different types of real- world communication and 
transportation networks. We find strong empirical evidence that the actual capacity 
of the network elements tends to be similar to the maximum available capacity if the 
cost is not strongly constraining. As more weight is given to the cost, however, the 
capacity approaches the load nonlinearly. In particular, all systems analyzed show 
larger unoccupied portion of the capacities on network elements subjected to smaller 
loads, which is in sharp contrast with the assumptions involved in (linear) models 
proposed in previous theoretical studies. We describe the observed behavior of the 
capacity-load relation as a function of the relative importance of the cost by using 
a model that optimizes capacities to cope with network traffic fluctuations. These 
results suggest that infrastructure systems have evolved under pressure to minimize 
local failures but not necessarily global failures that can be caused by the spread of 
local damage through cascading processes. 
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1. Introduction 

Various problems of immediate social and economical interest, ranging from the 
likelihood of power outages |T] and Internet congestion [2] to the affordability of public 
transportation [3], are ultimately constrained by the extent to which the assignment 
of capabilities matches supply and demand under realistic conditions. Continuous 
effort has been made to enhance the performance and limit the cost of individual 
system components, such as power transmission lines, computer routers and roads, 
with outcomes impacting the efficiency of virtually all infrastructure systems. Yet, the 
relation between the large-scale allocation and actual usage of resources in distributed 
infrastructure systems remains essentially unexplored and unexplained. For example, 
in a system as costly and up-to-date as the Internet router network, we find that on 
average more than 94% of the available bandwidth remains unused, which is comparable 
to the usage of data networks reported in previous studies [I]. 

We investigate this problem by focusing on the relationship between capacity and 
load. We first note that the activity of many infrastructure systems can be modeled as 
a network transport process. For example, website browsing and e-mail communication 
are based on packet transport through the Internet, movement of people and goods is 
heavily based on road, rail, and air transportation networks, while the provision of public 
utility services depends on the transport of energy, water and gas carried by power grids 
and other supply networks. In these examples, the transport of packets, passengers, and 
physical quantities creates traffic loads that must be handled by nodes and links of the 
underlying networks. In order to provide stable functioning, the capacities of nodes and 
links have to be large enough to handle the loads under variable conditions. On the other 
hand, the capacities are limited by cost and availability of resources, which increases the 
probability of failures if loads increase. Proper allocation of capacities is thus a basic 
requirement for the robust and efficient operation of infrastructure networks. 

The recent realization that numerous systems can be modeled within the common 
framework of complex networks [HI [6] has stimulated many theoretical studies on 
structural resilience [7J [H [9j [101 HH [12] and congestion problems [T3J [HI [151 EE E] 
as well as cascading failure [HI [19l [20l HH [221 [231 [2H [251 ESI [27] and cascade control 
analysis [281 EHJ M, EH E21 ESI EI]. Studies of air transportation networks [351 EE], 
in particular, have shown that the strengths of the network connections may follow a 
pattern partially determined by the network topology [371 EH] • However, despite much 
advance, the determination of the capacity and load characteristics of real networks is 
a question that goes beyond previous complex network research. 

Here we study this question from the perspective of a decentralized optimization 
between robustness and cost. We analyze four types of infrastructure networks, the air 
transportation, highway, power-grid and Internet router system. We find empirically 
that the capacity-load relation is mainly determined by the relative importance given to 
the cost and exhibits an unanticipated nonlinear behavior, which, as shown schematically 
in Figure [U is very different from the constant [201 [21], random [181 I2Z], and 




Figure 1. Capacity allocation pattern in a sampled part of the US power-grid 
network. The color and width of the links indicate the load and capacity of the 
transmission lines, respectively. The top left panel indicates the typical overall 
capacity-load relation observed in real infrastructure networks, where components with 
larger capacity have larger load-to-capacity ratio. 



linear [T9], [22J, [2H [25], [26J, [30] assignments of capacities considered in previous models (see 
also j3T] E21 [33]). We study this nonlinearity using the concept of unoccupied capacity, 
which we defined as the difference between the capacity and the time-averaged load. It 
follows that the percentage of unoccupied capacity is smaller for network elements with 
larger capacities. 

We demonstrate the observed behavior using a traffic model devised to minimize 
the probability of overloads in a scenario of fluctuating traffic and limited availability of 
resources. This model accommodates the interpretation that the empirical distributions 
follow from a decentralized evolution in which capacities and loads are (re) allocated in 
response to network stress caused by increasing load demand. In the US power grid, for 
example, the load demand increases by 2% per year and detectable blackouts occur on 
an average of one every 13 days [39], HO]. Despite being driven by a decentralized process, 
the long term accumulation of local changes can give rise to an organized pattern in the 
capacity-load relation, suggesting similarities between network evolution and other self- 
organized phenomena |39l HQ] W\\ . In particular, our model shows that the reduction 
of the unoccupied capacity in high-capacity elements is mainly a consequence of the 
reduction of the traffic fluctuations for higher loads, but it also shows that the probability 
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of overloads can be larger on elements with larger capacities. These results should enable 
researchers to build models to gain insights into the evolution of decentralized systems 
and, in particular, to evaluate the impact of disturbances in large communication and 
transportation networks. 

2. Empirical Data and Capacity-Load Characteristics 

We investigate the properties of load and capacity distributions in four different types 
of real-world infrastructure networks: power-grid network, highway network, Internet 
router network, and air transportation networks. In each of these systems, the load 
and capacity are defined considering the type of traffic on the network, namely, electric 
power in the power grid, traffic of vehicles in the highway network, packet flow in the 
Internet, and passengers and aircraft in the air transportation networks. The load 
represents an averaged quantity over a period of time in all the networks considered. 
Figure [2] shows the relationship between the load and capacity for the network elements 
in the respective systems. In the analysis of the real data, we find that the capacity- 
load relation depends on the specific network, but the pattern of this dependence can 
be understood as the result of a trade-off between the cost of the capacities and the 
robustness of the network. In the following, we discuss in detail the datasets examined 
and the empirical capacity-load characteristics. 

2.1. Air Transportation Networks 

2.1.1. Airway Network. We analyze the aviation data available at the Bureau of 
Transportation Statistics database (htt p:/ /www.bts.gov[ ), which contains the seat 
occupation data of US and foreign aircraft operating between 1449 US and foreign 
airports in the year 2005. Flights with both origin and destination outside the US are 
not included in the database. The load L and the capacity C of an airway connecting 
two airports are defined as the total number of occupied and available seats of all flights 
connecting the airports, respectively. Figure EJ^a) shows that the airway network has 
a very efficient capacity distribution. While there are data points with the capacity 
larger than the load, the capacity-load relation is very close to the line of maximum 
efficiency C(L) = L. This efficiency is likely to be related to the high costs of air 
transportation, which create strong incentives for the airline companies to operate with 
a high occupancy factor. 

2.1.2. Airport Network. We also examine a different dataset obtained from the 
International Air Transportation Association (IATA) [12], which reflects the operation 
and physical capacity of 90 major international airports in 2002. In the airway 
network considered above, the load and the capacity are defined for links (airways) 
as the total occupied and total available number of seats in flights connecting two 
airports, respectively. In contrast, for this dataset, we define the load and capacity 
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Figure 2. Capacity-load relation of real infrastructure networks, (a) Total number 
of occupied (L) versus available seats (C) in aircraft departing from and arriving at 
1449 US and international airports in 2005. (b) Peak-hour aircraft movements (L) and 
nominal peak-hour capacity (C) of 90 international airports in 2002. (c) Design hourly 
volume (L) versus estimated capacity (C) of 1559 Colorado highway segments in 2005. 
(d) Apparent power (L) versus maximum apparent power (C) of 5855 transmission 
lines in the power grid of Texas in the summer peak of 2000. (e) Monthly averaged 
traffic (L) versus bandwidth (C) of the 721 router interfaces of the ABILENE backbone, 
MIT and Princeton University networks in June 2006. The filled boxes with curve fits 
indicate the averaged capacity- load relation C(L) calculated in a logarithmic scale. 
The line of maximum efficiency C = L (dashed line) is shown for comparison with the 
data. 



for nodes (airports) as the peak-hour aircraft movements (arrivals + departures) and 
corresponding capacity declared by each airport We call this network the airport network 
to distinguish from the airway network. Figure [2(b) shows that the capacity-load 
relation of the airport network is close to the line C(L) = L, indicating that very 
large airports tend to operate close to full capacity 

\ Because the declared capacity is not necessarily a sharp limit, some airports can be found to operate 
above the capacity in Figure [2jb) . 
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2.2. Highway Network 

We examine the traffic data of the state of Colorado in the year 2005 for 1559 
highway segments available at the Colorado Department of Transportation database 
(htt p://www.dot.state.co.us] ). For each segment, we define the subjected load L as the 
design hourly volume (the 30th highest annual hourly traffic volume) in units of the 
number of cars per hour, which is regarded as the typical peak hour traffic volume for 
operational and design purposes |43j. Since the capacity itself is not available in the 
database, C is estimated from the volume-to-capacity ratio (~ L/C) and the design 
hourly volume L. Figure [2](c) indicates that the capacity-load relation of the highway 
network is different from that of the airway network in the region of small loads. While 
the capacity is close to the load for the highway segments with large loads, there are 
many secondary segments with capacities much larger than their loads. This indicates 
that, as compared to the air transportation network, efficiency has a lower priority in the 
highway network. In this system, the segments with C 3> L may provide an alternative 
route for congested traffic and attenuate jamming in peak hours. However, the behavior 
C ~ L in the large L region suggests that the cost is also an important limiting factor 
in the construction and maintenance of the highway system. 



2.3. Power-Grid Network 

We consider the power-grid network of the Electric Reliability Council of Texas 
( http://www.ercot.com[ ) and analyze the maximum apparent power (5 max ), the real 
power (P) and the reactive power (Q) measured at the summer peak of the year 2000 
for 5885 transmission lines. We define the load and the capacity of a transmission line 
as the apparent power (L = v/P 2 + Q 2 ) and its maximum allowed value (C = S max ) in 
units of MVA, respectively. Figure EJ^d) shows a pattern similar to the one observed in 
the highway network: there exists an abundance of transmission lines with the capacities 
much larger than the loads. In a power grid, the robustness may be even more important 
than in a highway network because once a failure cascades, such as in the August 14, 
2003 North America blackout, it can cause losses in a very large scale. Compared to 
the highway network, the power grid has larger unoccupied capacities for the heavily 
loaded components of the network, a feature that is useful in this case for the dispatch 
of power generation to adjust to specific market, weather, and demand conditions. 



2.4- Internet Router Network 

We analyze the average packet traffic data in June 2006 monitored by the Multi Router 
Traffic Grapher ( http://oss.oetiker.ch/mrtg/[ ) in the 721 routers of the ABILENE 
backbone, MIT, and Princeton University networks. We define the load L and the 
capacity C of a router respectively as the monthly average of occupied and available 
bandwidth of the network interface of the router in units of bps (bit/s). Figure El(e) 
shows a weaker dependence of the capacity on the load than those found for the other 
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Figure 3. Averaged capacity- load relation of the air transportation networks, highway 
network, power-grid network, and Internet router network along with the corresponding 
efficiency coefficient e. The data for the load are log-binned to obtain the averaged 
relation C{L). For a comparison between networks having different ranges of capacity 
and load, the data are normalized by the maximum load value L max . Data points with 
very small load {L/L max < I0~ 3 ) do not affect the tendency and are not shown. 



networks, which a property partially explained by the discreteness of the capacities in 
the Internet router network. There are, indeed, only few classes of router interfaces 
commercially available, with bandwidths of 10 Mbps, 100 Mbps, 1 Gbps, 10 Gbps, and 
so on. Routers in the same group, such as the same university or the same backbone 
network, tend to be simultaneously upgraded to an upper class of capacities, regardless 
of their actual individual loads. The substantial upgrades required by the fast growing 
bandwidth demand contribute to the observed large margin of unoccupied capacities. 
Thus, compared to the other networks, one can argue that in the Internet router system 
robustness is prioritized over cost. 

2.5. Efficiency Coefficient 

The analysis above provides evidence that the capacity allocation pattern can be traced 
back to the importance of the cost in the construction and maintenance of the system. 
For a quantitative explanation, we introduce the efficiency coefficient e of the network, 
which we define as the ratio between the total load ^ L« and total capacity Cj of 
the system: 



e = 



(1) 
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This quantity serves as a measure of the importance of the cost versus robustness As 
the cost becomes more important, the capacity is expected to be set closer to the load 
to prevent overallocation of resources, which increases e. 

This tendency is confirmed in Figure El where we show how the averaged capacity- 
load relation C(L) depends on the efficiency coefficient e for all networks analyzed. 
The efficiency coefficient is e = 0.73 (airways) and e = 0.90 (airports) for the air 
transportation networks, 0.54 for the highway network, 0.29 for the power-grid network, 
and 0.06 for the Internet router network. The extremely high efficiency coefficient of 
the airport network may be partially related to the fact that this database refers to 
major airports only. However, as illustrated in the case of the power-grid and highway 
network, which have different trends for small L, the overall efficiency coefficient can be 
dominantly determined by the most loaded elements in the network (note the logarithmic 
scale in Figure [3]) [jj 

3. Capacity Optimization and Traffic Fluctuation Model 

Having identified the capacity-load characteristics of real-world networks, we now study 
the empirical findings using a theoretical model based on optimizing the capacity and 
the cost at the level of individual nodes or links. We define a simple objective function 
Fi for node (or link) i as 



where -Rj(Cj) and Si(Ci) are the robustness and the cost function, respectively, and 
the weight factor w G [0, 1] represents the importance of the cost. If we choose the 
robustness R{ (cost Si) to be a decreasing (increasing) function of Ci, the minimization 
of Fi will lead to an optimized capacity for node i subjected to the time-averaged 
load Li, which defines a capacity-load relation C(L). 

In order to determine the robustness function, it is important to identify the 
main source of perturbation affecting the system. While the information about the 
entire network is quite limited in general, the local time-variations of load provide 
valuable information about the vulnerability of the individual network components. 
Here we consider the fluctuation of traffic over time as the sole perturbation that can 
causes accidental failure or malfunctioning due to overloading. Traffic fluctuation is a 
fundamental and ubiquitous property of the traffic dynamics which has been studied 
for a wide range of real networks [l5l H6] . Recent studies have reported that in Internet 
routers, microchips, rivers and highways, the average traffic load L and the standard 

§ Note that the efficiency coefficient accounts for the usage of the available capacities rather than the 
efficiency of the routing algorithm [44] . 

To further examine the contribution of large loads and capacities, we also considered a modified 
definition of the efficiency coefficient, e' = log(Xi)/ log(Ci), which leads to e' = 0.97 (airports), 
0.92 (airways), 0.84 (highways), 0.67 (power grid) and 0.57 (Internet routers). This indicates that, 
while the value of the efficiency coefficient itself would change, the tendency across the systems remains 
the same even if less emphasis is given to the elements with large L and C . 




(2) 
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deviation of the load cr are related through the scaling a ~ L a with a = 0.5 — 1.0 
[US ST! HE]. This further indicates that the capacity designed to tolerate traffic 
variability can be expressed in terms of the average load. 

We model the traffic fluctuations using a simple transport process in which a 
certain amount Xjkif) of traffic is created at a source node j at time t and moves 
to a destination node k along a predetermined path. This process includes as a special 
case the directed flow model introduced in a previous study [IB], where random pairs 
of source and destination nodes are selected for the creation of traffic at each time step 
and the traffic moves along the shortest path. Here we consider a more general yet 
mathematically treatable model that is less dependent on the details of the network 
structure and routing rules. A simple microscopic description of our model has been 
anticipated in [49] . 

In our model, we define the load kit) as the amount of traffic processed per unit 
time at node % measured during a time window St. Formally, we can write the load li(t) 
as 

1 rt+6t ft' 

k{t) = - dt 1 dt" V x jk (t")$ jk (t"; i, t'), (3) 

St J t J-OO ^ 

where $>j k (t"; i, t') is a propagator of the traffic towards node k starting from node j at 
time t" and passing through node i at time t'. For transport through predetermined 
paths, &jk{t";i,t') can be rewritten using the travel time tjl elapsed to reach node % 
as z^S(t" + t^l — t'), where zQ is 1 if the traffic from j to k passes through i and 
otherwise. Then, li(t) can be simplified as 

r i rt+St 

m = E 4 r >) = E 4 j t - $) • ( 4 ) 

j,k j,k l jt j 

To identify the stochastic characteristics of k(t), a reference we use for capacity 
determination, the measurement time St can be chosen to be of the order of the 
autocorrelation time of the load fluctuation k z^Xj^t—t^l). Then, we can obtain the 
distribution of loads Pi(h) for a large number of St intervals and thereby the overloading 
probability which can be derived as 



&(Ci) = Prob[Z< > Q] = / Pi{k)dk (5) 

for given capacity Cj such that Lj < Ci < C max , where Lj = J kP^l^dli. We assume 
that the capacity Cj is physically upper-bounded by the maximum value C max and is 
lower-bounded by the line of maximum efficiency C = L. 

We choose the overloading probability as the robustness function so that Ri(Ci) = 
£i(Cj), where a smaller overloading probability represents a higher robustness in a 
probabilistic sense. The cost function is defined for concreteness as a linear function 
of the capacity, Si(Ci) = Ci/C max . Therefore, we can rewrite the objective function 
explicitly as 

Fl = (l-w)UQ)+w-^, (6) 
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where L { < Ci < C max . The optimized distribution of capacities can now be calculated 
by minimizing this objective function. 

In order to explicitly calculate the capacity-load relation within this optimization 
model, we consider both uncorrelated and synchronized traffic fluctuations under the 
condition that every traffic creation event shares identical statistical properties. In the 
case of the uncorrelated fluctuations, a traffic creation event at a node is statistically 
uncorrelated with those at the other nodes. In the case of the synchronized fluctuations, 
on the other hand, we assume that every node creates the same amount of traffic 
simultaneously. It is worthwhile considering both types of fluctuations in view of 
the recent empirical evidence [17] that random internal fluctuations can be strongly 
modulated by external driving forces. In the Internet backbone, for example, it has 
been observed that the traffic dynamics is well characterized by a Poisson process for 
millisecond time scales, while long-range correlations appear for longer time scales [50J. 



3.1. Uncorrelated Fluctuations 



We consider uncorrelated fluctuations in which the amount of traffic r^(t) created at 
the source node j and moving to the destination node k is completely uncorrelated with 
the traffic between different source-destination node pairs. In this regime, the quantity 
r^ k \t) can be regarded as an independent identically distributed random variable r 
following a probability distribution p(r). Therefore, assuming that p(r) has finite 
moments, including average f and variance s 2 , we apply the Central Limit Theorem 
I5T1 to calculate the distribution of loads. This leads to a Gaussian distribution of loads 



Pi(U) ~ — 7^= ex P 

0~iV 2lT 



(h - U) 

2a? 



(7) 



with the average Lj = rz^ = r^2jk z jk anc ^ the var i ance °\ — s 2z i- Note that the 

1 /2 

relation cr, ~ L/ , a corollary of ([TJ, is in agreement with the results of previous studies 

HEJHTJIIE]. 

Using ([7J), we can obtain the solution of the optimized capacity- load relation 

C(L) by minimizing F in (jSJ). The resulting capacity-load relation is expressed as 
C(L) = min{C / (L),C max } with 

C( L ) = J L + 9^ \/\ogVL(L) HL<L W , ^ 
L if L ^> L w: 

where 

m = 4=^%' (9) 



parameter g denotes a/2s 2 /t, and L w satisfies Vt(L w ) = 1. 



3.2. Synchronized Fluctuations 



To implement synchronized fluctuations within our model, we assume the following 
properties of the traffic variables. First, while the uncorrelated fluctuations occur in 
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a short time scale, the modulation that we attempt to describe by the synchronized 
fluctuations occur in a much longer time scale [UJ HH]. Such a synchronization can be 
generally triggered by exogenous factors, such as weather and seasonal conditions, or 
collective behavior. Second, we assume that the travel time tjl is much shorter than 
the time scale of the modulation. Then, neglecting the travel time and using that 
the synchronized traffic creation Xjkif) can be represented by a single function x(t), we 
can set r^(t) = r(t) to write the load as U{t) = r(t)zi. 

Assuming statistical independence of r(t)'s in different modulation periods, we 
consider the stochastic characteristics of the peak value of r(t) defined in each 
modulation period as a reference for capacity determination. Given the distribution of 
the peak values p(r) in many different modulation periods, we can write the overloading 
probability £j for a given capacity as 



UCi) = / P{li)dk = / p(r)dr, (10) 

JC % JfCi/Li 

where the average load is Lj = fZi and f = J rp(r)dr. 

By minimizing the objective function F in ([6]), the optimized capacity is obtained 
as C = min{C"(L), C max } with 

C'(L) = < ^ q<K ^f^i> lf L < Lw > (11) 
1 I if L > L w , 

where L w = fC max ^p max r p(r) and q(y) = r is obtained by inverting y = p(r). For 
p(r) having a single maximum, while two solutions of r are generally possible in the 
equation y = p(r), we conventionally select q(y) with the larger value of r, which gives 
the larger capacity. 

The distribution p(r) is the final ingredient for the explicit calculation of C(L) 
in the synchronized fluctuation regime. Because we have defined r as the maximum 
value of many traffic creation events, the distribution of maxima in the extreme value 
statistics can be used as an input for p(r) in the model. Here we numerically calculate 
C(L) for the Gumbel distribution p g (r) and the Frechet distribution Pf(r), referred to 
as the first and second asymptotes in the extreme value statistics literature [52], which 
are written as 

p g (r) =-exp[-^^-e-^], (12) 

p / (r) = ^r-^ 1 exp[-(-)^], (13) 

a 1 a 

where all parameters are positive. These extreme value distributions cover two types of 
unbounded initial distributions, the exponential type for p g and the power-law type for 
Pf. The third asymptote is for strictly bounded initial distributions and gives similar 
results as the bound of the created traffic becomes large. 

Figure H] shows the numerically calculated capacity- load relation C(L) for 
uncorrelated and synchronized fluctuations. In both regimes we find that the allocation 
of capacities exhibits characteristics in common with the empirical data. In particular, 
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Figure 4. Capacity-load relation predicted by the optimization model for different 
values of the weight w given to the cost: (a) uncorrelated fluctuation regime and (b- 
c) synchronized fluctuation regimes. The model parameter g = 3 is used for the 
uncorrelated regime. The extreme value distributions are assumed to be (b) the 
Gumbel distribution with parameters (fj,, (3) = (100, 20) and (c) the Frechet distribution 
with (a, 7) = (1, 2). The capacity and load are normalized by the predefined maximum 



as the weight factor w decreases (reducing the importance of the cost), in all cases the 
curve C(L) recedes from the line C = L and moves up towards the line C = C max . More 
important, the calculated C(L) shows the common trend that a larger relative deviation 
from the linear line C = L, representing a larger unoccupied portion of the capacity, is 
found in the region of smaller L. We note that our model, and hence the conclusions we 
draw from it, are determined by general statistical properties of the traffic and do not 
depend on the details of the network structure and dynamics. This generality represents 
an advantage over previous models based on betweenness centrality because, as shown 
in |Appendix A[ the latter is only weakly correlated with the actual flow in the networks 



and cannot provide information about C(L). 

The traffic fluctuations considered in our model reflect a realistic feature of 
infrastructure networks. However, it remains an open problem to develop a fully realistic 
model. One possible direction for future research is to consider intermediate regimes 
that incorporate both uncorrelated and synchronized fluctuations. This is relevant 
for situations in which a synchronized perturbation occurs together with uncorrelated 
background fluctuations. Another direction concerns the incorporation of network- 
structure dependence of traffic control and capacity determination. In the case of the 
Internet router network, for example, previous works have shown that the capacity 
and degree are negatively correlated [53] , suggesting a potential relation between the 
network topology and the exceptionally large nonlinearity in the capacity-load relation 
of Figure 12(e). In addition, it is valid to consider the impact of the apparent lower 
bound in the capacities [M], which may further contribute to the observed nonlinearity. 
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Figure 5. Overloading probability for the optimized capacities in (a) the uncorrelated 
and (b-c) synchronized fluctuation regimes. The unspecified parameters are the same 
as in Figured) 




Figure 6. Unoccupied capacity (C — L) for the optimized capacities in (a) the 
uncorrelated and (b-c) synchronized fluctuation regimes. The parameters are the same 
as in Figure [3 



4. Unoccupied Capacity and Overloading Probability 

Our empirical results are in sharp contrast with the linear capacity-load relation 
hypothesized in previous work, and our model shows that the apparently universal 
nonlinear behavior is a consequence of fluctuations in the traffic load. Because larger 
loads tend to result from a larger number of traffic events, the relative size of the 
fluctuations tend to decrease as the load increases; considering that the unoccupied 
capacity is mainly determined by the perturbations caused by traffic fluctuations, this 
explains why the unoccupied portion of the capacity is observed to be smaller in network 
elements with larger loads and capacities. From this perspective, the observed decrease 
in the percentage of unoccupied capacity is a consequence of the law of large numbers. 

However, the same analysis also reveals two surprising elements. First, the predicted 
overloading probability calculated in (jSJ) using (jSJ) and ffTTj) . is larger for network 

elements subjected to larger loads, despite the fact that the capacities are also larger 
and the relative fluctuations are smaller (Figure [5]). The explanation for this is that, 
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Figure 7. Unoccupied capacity (C — L) for the airway, highway, power-grid and 
Internet router network. The data are log-binned to obtain the average behavior of 
C—L as a function of L. Curve fits (C—L) ~ L n are given for each network to compare 
with the theoretical model. For a comparison between networks having different ranges 
of capacity and load, the data are normalized by the maximum load £ ma x- 



although the relative size of the fluctuations decreases, their absolute size increases as 
the load increases. Therefore, the reduction in the unoccupied portion of the capacities 
as the load increases is not only a consequence of the decreasing fluctuations but also 
partially due to the optimization of capacities. For concreteness we have assumed 
that the cost is a linear function of the capacity, but similar or more pronounced 
behavior is predicted for superlinear cost functions ( Appendix B ). Second, the reduction 
in the unoccupied portion of the capacities is observed not only when the traffic 
events are statistically independent but also when the fluctuations are synchronized. 
For synchronized fluctuations, the sublinear behavior of C(L) cannot be anticipated 
from (non-optimal) egalitarian solutions determined by an equal probability £ = £ c of 
overload for every node, since equal probabilities lead to a linear dependence C oc L 
in the capacity-load relation. Setting £ = £ c in (j5J), the capacity for the egalitarian 
solutions is derived as C = L + geri^ 1 ^! — 2£ C )L 1 / 2 in the uncorrelated regime, while the 
corresponding equation f^^ L p(r)dr = £ c leads to C/L = constant in the synchronized 
regime. Therefore, while synchronized fluctuations are expected to consume more 
resources, the optimization of the capacities makes this less severe by allowing higher 
probability of overloads in components with larger loads. For uncorrelated fluctuations, 
the capacity has a sublinear term even for egalitarian solutions. 

The real data corroborates the interpretation that the capacities are closer to 
optimal than to egalitarian solutions in three out of four systems analyzed 0. This 



% The airport data is not considered here because it comprised few data points and leads to a 
statistically poor distribution for C — L. 
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Table 1. Unoccupied capacity. Exponents of a power-law curve fitting C — L ~ U 1 
for egalitarian solutions and for solutions of the optimization model. 
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follows from a comparison between the unoccupied capacity C — L calculated from 
our optimization model (Figure [6]) and the empirical unoccupied capacity (Figure [7]). 
In the optimization model, C — L increases sublinearly with L (except for the region 
of very large loads, where it decreases). This follows directly from ([8]) for uncorrelated 
fluctuations and ( TTTT) for synchronized fluctuations. The egalitarian capacity distribution 
also shows sublinear behavior for uncorrelated fluctuations, but the scaling exponent is 
different from the one obtained from the optimization model (Tabled]). This difference 
can help determine the origin of the distributions in real networks. For the power-grid 
and Internet router network, C — L grows sublinearly with L, consistently with the 
predicted optimal solutions for which the absolute value of the unoccupied capacity 
C — L generally increases with L but it does so slower than L 1 / 2 . For the highway 
network, C — L is approximately constant in a wide range of the load. This interesting 
property of the highway network is an extreme example of the reduction in the portion 
of unoccupied capacity in the main elements of the system. For the airway network, 
C — L is an approximately linear function of L Although p(r) in (TTTT) cannot be easily 
determined within our model, this provides evidence that, in contrast with the other 
networks, the air transportation system is dominated by synchronized fluctuations and 
operates close to an egalitarian solution. Besides being strongly seasonal, the airway 
network is the only system in our dataset that allows for real-time capacity adjustment. 

An important implication of the observed nonlinearity in the capacity-load relation 
is that infrastructure systems appear to have evolved under the pressure to minimize 
local failures rather than global failures. Previous work [28] has established that the 
incidence of large cascading failures can be reduced by removing low-loaded nodes, 
despite the fact that this causes the concurrent increase in the incidence of local failures. 
In the present model this would correspond to a higher probability of overloads £(L) 
for small L, which is the opposite of the trend observed in this study. The apparent 
vulnerability to large-scale failures is consistent with the absence of global optimization 
in real-world infrastructure networks that evolve in a decentralized way. In the case 
of the power grid, for example, it has been proposed [40J that the evolution of the 
system is driven by the opposing forces of slow load increase and corresponding system 
upgrades. That is, the evolution is determined by a dynamic equilibrium near a 
point of overloading, which represents an optimized state balancing capacities and the 
probability of blackouts. It is likely that a similar self-organization mechanism is at 
work in infrastructure systems in general. While providing additional rationale for 
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the decentralized optimization incorporated in our model, this view emphasizes that in 
infrastructure systems local robustness is prioritized over global robustness. 

5. Conclusions 

We have presented a unified study of the large-scale pattern of resource allocation 
in diverse real-world infrastructure networks. Our empirical and theoretical analysis 
provides evidence that in all systems analyzed the determination of capacities results 
from a decentralized trade-off between cost and robustness. Our optimization model 
accounts for the perturbations introduced by traffic fluctuations and reveals that system- 
specific characteristics of the observed nonlinear behavior of the capacity-load relation 
are mainly determined by the weight given to the cost. It is interesting to note, 
however, that the capacity allocation is fairly independent of the details of the network 
structure and traffic dynamics, and it is well expressed as a function of the average load 
at individual network components. By describing both universal and system-specific 
characteristics, our analysis contributes to a unified understanding of self-organized 
patterns driven by decentralized evolution and operation, which is a problem that carries 
implications for the study of complex systems in general. 
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Appendix A. Capacity and load versus graph-theoretic centralities 

In the study of complex networks, the importance of nodes and links is often estimated 
from graph-theoretic centralities [55] defined by the structure of the network. We have 
compared the empirical data with two widely used centralities, the degree and the 
betweenness centrality (55], [561 EZ] : 



where k\ ou and k\ m) denote the out-degree and in-degree of node i, respectively, and 6(g) 
denotes the betweenness centrality of a node or link represented by g. Here A = (Ay) 
is the adjacency matrix, h(i,j;g) is the number of shortest paths from node i to node 
j passing through g, and h(i,j) is the total number of shortest paths from i to j. The 
component Ay of the adjacency matrix is defined as 1 if node i has an incoming link 
from node j and otherwise. 

Previous studies [571 EH] have found very broad distributions of node and link 
betweenness centralities in complex networks, which is also the case for the real-world 
networks we have considered here. However, as shown in Figure IA1I for the power- 
grid and airway network, whose network topologies are available in our database, 
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Figure Al. Comparison between empirical load and link-betweenness centrality. The 
scattered plots for (a) the power-grid and (b) airway network indicate that very small 
correlations exists between the empirical load and the link-betweenness centrality. 
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Figure A2. Capacity of the links as a function of their end-node degrees fcj and kj in 
(a) the power-grid and (b) airway network. The capacity is averaged in the logarithmic 
scale of the degrees. Note that in (a) out-degrees and in-degrees (k^ out ^ and k^ m ') are 
not distinguished since electric power can be transferred in both directions on the same 
power transmission line. 



the correlation between the empirical load and the betweenness centrality is not 
meaningfully strong. The Pearson correlation coefficient between the two quantities is 
0.27 for the power-grid network and 0.02 for the airway network. This weak correlation 
indicates that transport in real networks is a process considerably different from that 
suggested by betweenness centrality. 

Figure IA2I shows the relationship between the degree, another widely used graph- 
theoretical centrality, and the empirical capacity in the power-grid and airway network. 
In the airway network, the behavior of the capacity is comparable with ~ (kf )Ut ^k^) 6 
found in previous studies [35J. The power-grid network exhibits a stronger deviation 
from this power-law behavior, particularly for links with large kik/s. Therefore, the 
distribution of traffic loads and capacities in real networks is indeed more complex than 
expected from graph-theoretic centralities, such as betweenness centrality and degrees. 
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Appendix B. Superlinear cost functions 

Generalizing our analysis to nonlinear cost functions [59], we can write the objective 
function as 

F=(l-w)aC) + w(-^-) , (B.l) 

and examine how the overloading probability £ depends on v. Here we consider the case 
of superlinear functions [y > 1) in the uncorrelated fluctuation regime. 
In this case, dF/dC = leads to 



C-L 



'log 



/ 1 1-w 1 

which indicates that (C — L) / L l l 2 is a decreasing function of L. The decreasing behavior 
of (C — L)/ L 1 / 2 is clear for an increasing function C (L) because the right hand side of 
(IB.2j) decreases when both C and L increase. If C(L) is a decreasing function, the term 
(C - L)/L 1/2 itself becomes a decreasing function of L. The monotonically increasing 
behavior of C(L) is generally valid for C ^> L. When C(L) approaches C = L, where 
the overloading probability is almost saturated, the cost function can dominate the 
objective function. If that is the case, the minimization of F is achieved by decreasing 
C even if L increases. This only happens when L is very large and can be regarded as 
an artifact of our selection of the cost functions. The overloading probability £ can be 
written explicitly using the error function erf(x) as 



1 — erf 



, , (B.3) 

Since we have shown that (C — L)/ L 1 ^ 2 is a decreasing function of L, using the fact that 
erf(x) is an increasing function of x, it is straightforward to show that the overloading 
probability £ is an increasing function of the load L. The behavior of £(L) is thus 
similar or more pronounced than for the linear case v — 1, indicating that the main 
results remain valid for v > 1. 



References 

[1] Ceilings C W and Yeager K E 2004 Phys. Today 57 45-51 
[2] Harman B A and Lukose R M 1997 Science 277 535-7 

[3] Daganzo C F 1997 Fundamentals of Transportation and Traffic Operations (Oxford: Elsevier) 
[4] Odlyzko A M 2003 Review of Network Economics 2 210-237 

[5] Newman M E J, Barabasi A-L and Watts D J (eds) 2006 The Structure and Dynamics of Networks 

(Princeton: Princeton University Press) 
[6] Dorogovtsev S N, Goltsev A V and Mendes J F F 2007 Preprinf larXiv:0705.0010k r6 [cond-mat.stat- 

mech] 

[7] Albert R, Jcong H and Barabasi A-L 2000 Nature 406 378-82 

[8] Callaway D S, Newman M E J, Strogatz S H and Watts D J 2000 Phys. Rev. Lett. 85 5468-71 
[9] Cohen R, Erez K, ben-Avraham D and Havlin S 2000 Phys. Rev. Lett. 85 4626-9 
[10] Cohen R, Erez K, ben-Avraham D and Havlin S 2001 Phys. Rev. Lett. 86 3682-5 



Fluctuation-driven capacity distribution in complex networks 



19 



11] Shargel B, Sayama H, Epstein I R and Bar- Yam Y 2003 Phys. Rev. Lett. 90 068701 

12] Sole R V, Rosas-Casals M, Corominas-Murtra B and Valverde S 2008 Phys. Rev. E 77 026102 

13] Arenas A, Dfaz-Guilera A and Guimera R 2001 Phys. Rev. Lett. 86 3196-9 

14] Holme P 2003 Adv. Complex Syst. 6 163-76 

15] Toroczkai Z and Bassler K E 2004 Nature 428 716 

16] Noh J D, Shim G M and Lee H 2005 Phys. Rev. Lett. 94 198701 

17] Sreenivasan S, Cohen R, Lopez E, Toroczkai Z and Stanley H E 2007 Phys. Rev. E 75 036105 

18] Watts D J 2002 Proc. Natl. Acad. Sci. USA 99 5766-71 

19] Motter A E and Lai Y-C 2002 Phys. Rev. E 66 065102(R) 

20] Holme P and Kim B J 2002 Phys. Rev. E 65 066109 

21] Moreno Y, Pastor-Satorras R, Vazquez A and Vespignani A 2003 Europhys. Lett. 62 292-8 

22] Crucitti P, Latora V and Marchiori M 2004 Phys. Rev. E 69 045104(R) 

23] Albert R, Albert I and Nakarado G L 2004 Phys. Rev. E 69 025103(R) 

24] Kinney R, Crucitti P, Albert R and Latora V 2005 it Eur. Phys. J. B 46 101-7 

25] Zhao L, Park K H, Lai Y-C and Ye N 2005 Phys. Rev. E 72 025104 

26] Lee E J, Goh K-I, Kahng B and Kim D 2005 Phys. Rev. E 71 056108 

27] Kim D-H, Kim B J and Jeong H 2005 Phys. Rev. Lett. 94 025501 

28] Motter A E 2004 Phys. Rev. Lett. 93 098701 

29] Gallos L K, Cohen R, Argyrkis R, Bunde A and Havlin S 2005 Phys. Rev. Lett. 94 188701 

30] Schafer M, Scholz J and Grciner M 2006 Phys. Rev. Lett. 96 108701 

31] Zhao X M and Gao Z Y 2007 Eur. Phys. J. B 59 85-92 

32] Wang B and Kim B J 2007 Europhys. Lett. 78 48001 

33] Li P, Wang B-H, Sun H, Gao P and Zhou T 2008 Eur. Phys. J. B 62, 101-4. 

34] Buzna L, Peters K, Ammoscr H, Kuhnert C and Helbing D. 2007 Phys. Rev. E 75 056107 

35] Barrat A, Barthelemy M, Pastor-Satorras R and Vespignani A 2004 Proc. Natl. Acad. Sci. USA 
101 3747-52 

36] Guimera R, Mossa S, Turtschi A and Amaral LAN 2005 Proc. Natl. Acad. Sci. USA 102 7794-9 

37] Yook S H, Jeong H, Barabasi A-L and Tu Y 2001 Phys. Rev. Lett. 86 5835-8 

38] Barrat A, Barthelemy M and Vespignani A 2004 Phys. Rev. Lett. 92 228701 

39] Carreras B A, Newman D E, Dobson I and Poole A B 2004 LEEE Trans Circuits Syst I: Fundam 

Theory Appl 51 1733-40 

40] Dobson I, Carreras B A, Lynch V E and Newman D E 2007 Chaos 17 026103 

41] Jensen H J 1998 Self- Organized Criticality (Cambridge: Cambridge University Press) 

42] Air Transport Consultancy Services 2003 Airport Capacity /Demand Profiles (Geneva: Interna- 
tional Air Transport Association) 

43] Transportation Research Board 2002 Highway Capacity Manual (Washington, DC: Transportation 
Research Board) 

44] Youn H, Gastner M T and Jeong H 2007 Preprw^ laFXiv:0712.1598l v2 [physics.soc-ph] 

45] Guclu H, Korniss G and Toroczkai Z 2007 Chaos 17, 026104 

46] Arollo de Menezes M and Barabasi A-L 2004 Phys. Rev. Lett. 92 028701 

47] Arollo de Menezes M and Barabasi A-L 2004 Phys. Rev. Lett. 93 068701 

48] Duch J and Arenas A 2006 Phys. Rev. Lett. 96 218702 

49] Kim D-H and Motter A E 2008 J. Phys. A: Math. Theor. in press (Preprint 0801.1877vl 
[physics.soc-ph]) 

50] Karagiannis T, Molle M and Faloutsos M 2004 IEEE Internet Computing 8 57-64 

51] Gnedenko B V and Kolmogorov A N 1954 Limit Distributions For Sums of Independent Random 

Variables (Cambridge: Addison- Wesley) 

52] Gumbel E J 1958 Statistics of Extremes (New York: Columbia University Press) 

53] Li L, Alderson D, Willinger W and Doyle J 2004 Proc. SIGCOMM'04 (Portland, Oregon, USA) 

p 3-14 ( |http: / / doi.acm.org /10.1145/ 1015467.1 0T5470] 

[54] McCarthy P S and MaCarthy T 2001 Transportation Economics: Theory and Practice (Maiden: 



Fluctuation-driven capacity distribution in complex networks 



BlackWell) 

[55] Freeman M L 1977 Sociometry 40 35-41 

[56] Newman MEJ 2001 Phys. Rev. E 64 016132 

[57] Goh K-I, Oh E, Jeong H, Kahng B and Kim D 2002 Proc. Natl. Acad. Sci. USA 99 12583-8 
[58] Kim D-H, Noh J D and Jeong H 2004 Phys. Rev. E 70 046126 
[59] Aldous D J 2008 J. Stat. Mech. P03006 



